Fault Tolerance Assistant (FTA): An Exception Handling

نویسندگان

  • Aiman Fang
  • Ignacio Laguna
  • Kento Sato
  • Tanzima Islam
  • Kathryn Mohror
چکیده

We propose FTA, a programming model that provides failure localization and transparent recovery of process failures in MPI applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerance Assistant (FTA): An Exception Handling Programming Model for MPI Applications

Future high-performance computing systems may face frequent failures with their rapid increase in scale and complexity. Resilience to faults has become a major challenge for large-scale applications running on supercomputers, which demands fault tolerance support for prevalent MPI applications. Among failure scenarios, process failures are one of the most severe issues as they usually lead to t...

متن کامل

Implementing Coordinated Exception Handling for Distributed Object-Oriented Systems with AspectJ

Exception handling is a very popular technique for incorporating fault tolerance into software systems. However, its use for structuring concurrent, distributed systems is hindered by the fact that the exception handling models of many mainstream object-oriented programming languages are sequential. In this paper we present an aspect-based framework for incorporating concurrent exception handli...

متن کامل

Verification of Coordinated Exception Handling

An important challenge faced by the developers of faulttolerant distributed systems is to build fault tolerance mechanisms that are reliable. To achieve the desired levels of reliability, the development of mechanisms for detecting and handling errors should be rigorous or formal. In this paper, we present an approach to modeling and verifying faulttolerant distributed systems that use exceptio...

متن کامل

Implementing Coordinated Error Recovery for Distributed Object-Oriented Systems with AspectJ

Exception handling is a very popular technique for incorporating fault tolerance into software systems. However, its use for structuring concurrent, distributed systems is hindered by the fact that the exception handling models of many mainstream object-oriented programming languages are sequential. In this paper we present an aspect-based framework for incorporating concurrent exception handli...

متن کامل

Towards a Multi Agents System Coupling Replication and Exception Handling

Multi agents systems are formed of different independent entities placed in several machines. When an entity or an agent fails, it is the whole system that may be in a failure case. Through this paper, we will propose an approach that may guarantee fault tolerance in multi agents systems using two different techniques which are replication and exception handling. Replication uses redundancy to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015